An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features
نویسندگان
چکیده
This paper describes the preliminary results of a keyword spotting system using a fusion of spectral and cepstral features. Spectral energy in 16 bands of frequencies on Bark scale and 16 mel-scale warped cepstral coefficients are used independently and in combination with appropriate weights for recognizing word utterances. Results of matching features using Euclidean and cosine distances in a dynamic time warping (DTW) process demonstrate that cosine distance works better for Bark energy features while weighted Euclidean distance brings out the closeness of utterances in the cepstral domain. In both cases, performance of DTW shows an accuracy of better than 81 percent for different speakers while fusion of the two feature sets raises the score to over 86 per cent, both based on a small subset of utterances from the Call Home database. Key-Words: Speech recognition, Bark energy, Mel cepstrum, Feature fusion, Dynamic time warping.
منابع مشابه
Confidence Measure for Utterance Verification in Keyword Spotting System
In this article, we propose an utterance verification technique for keyword spotting. The keyword spotting system analyzes a given spoken content and searches every speech segment in which one of pre-defined keywords is uttered. To maintain a stable recognition performance in the system, we propose an utterance verification technique that verifies whether a found utterance, or a candidate keywo...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملKeyword Spotting Based On Decision Fusion
Automatic speech recognition (ASR) technology is available now-a-days in all handsets where keyword spotting plays a vital role. Keyword spotting performance significantly degrades when applied to real-world environment due to background noise. As visual features are not affected much by noise this provides better solution. In this paper, audio-visual integration is proposed which combines audi...
متن کاملRobust Keyword Spotting Using a Multi-Stream Approach
Speech recognition systems are prone to severe degradation in noisy environments due to mismatch between training and testing conditions. A multi-stream approach for keyword spotting is proposed to improve robustness in mismatched conditions. The assumption is that most real world noises are colored and do not affect the full spectrum equally, meaning certain parts of the spectrum can still pro...
متن کاملSpeech Emotion Recognition Using Residual Phase and MFCC Features
Abstract--The main objective of this research is to develop a speech emotion recognition system using residual phase and MFCC features with autoassociative neural network (AANN). The speech emotion recognition system classifies the speech emotion into predefined categories such as anger, fear, happy, neutral or sad. The proposed technique for speech emotion recognition (SER) has two phases : Fe...
متن کامل